read.table - Reads a file in table format and creates a data frame from it, with cases corresponding to lines and variables to fields in the file.
read.table*(file, header = FALSE, sep = ",", dec = ".", row.names, col.names, na.strings = "NA", nrows = -1, skip = 0, comment.char** = "#", fileEncoding = "", encoding = "unknown")
file: the name of the file which the data are to be read from.
Each row of the table appears as one line of the file. If it
does not contain an _absolute_ path, the file name is
_relative_ to the current working directory, ‘getwd()’.
Tilde-expansion is performed where supported. This can be a
compressed file (see ‘file’).
header: a logical value indicating whether the file contains the
names of the variables as its first line. If missing, the
value is determined from the file format: ‘header’ is set to
‘TRUE’ if and only if the first row contains one fewer field
than the number of columns.
sep: the field separator character. Values on each line of the
file are separated by this character. If ‘sep = ""’ (the
default for ‘read.table’) the separator is ‘white space’,
that is one or more spaces, tabs, newlines or carriage
returns.
quote: the set of quoting characters. To disable quoting altogether,
use ‘quote = ""’. See ‘scan’ for the behaviour on quotes
embedded in quotes. Quoting is only considered for columns
read as character, which is all of them unless ‘colClasses’
is specified.
dec: the character used in the file for decimal points.
numerals: string indicating how to convert numbers whose conversion to
double precision would lose accuracy, see ‘type.convert’.
Can be abbreviated. (Applies also to complex-number inputs.)
row.names: a vector of row names. This can be a vector giving the
actual row names, or a single number giving the column of the
table which contains the row names, or character string
giving the name of the table column containing the row names.
If there is a header and the first row contains one fewer
field than the number of columns, the first column in the
input is used for the row names. Otherwise if ‘row.names’ is
missing, the rows are numbered.
Using ‘row.names = NULL’ forces row numbering. Missing or
‘NULL’ ‘row.names’ generate row names that are considered to
be ‘automatic’ (and not preserved by ‘as.matrix’).
col.names: a vector of optional names for the variables. The default
is to use ‘"V"’ followed by the column number.
nrows: integer: the maximum number of rows to read in. Negative and
other invalid values are ignored.
skip: integer: the number of lines of the data file to skip before
beginning to read data.
comment.char: character: a character vector of length one containing a
single character or an empty string. Use ‘""’ to turn off
the interpretation of comments altogether.
stringsAsFactors: logical: should character vectors be converted to
factors? Note that this is overridden by ‘as.is’ and
‘colClasses’, both of which allow finer control.
fileEncoding: character string: if non-empty declares the encoding used
on a file (not a connection) so the character data can be
re-encoded. See the ‘Encoding’ section of the help for
‘file’, the ‘R Data Import/Export Manual’ and ‘Note’.
encoding: encoding to be assumed for input strings. It is used to mark
character strings as known to be in Latin-1 or UTF-8 (see
‘Encoding’): it is not used to re-encode the input, but
allows R to handle encoded strings in their native encoding
(if one of those two). See ‘Value’ and ‘Note’.
text: character string: if ‘file’ is not supplied and this is, then
data are read from the value of ‘text’ via a text connection.
Notice that a literal string can be used to include (small)
data sets within R code.
skipNul: logical: should nuls be skipped?
read.table*(file, header = FALSE, sep = ",", dec = ".", row.names, col.names, na.strings = "NA", nrows = -1, skip = 0, comment.char** = "#", fileEncoding = "", encoding = "unknown")
data<-read.table("data/root_length.csv",sep=";",dec=",",header=TRUE)
head(data)
tail(data)
summary is a generic function used to produce result summaries.
summary(data)
table uses the cross-classifying factors to build a contingency table of the counts at each combination of factor levels.
Let's show it for the number of lateral roots
table(data$Lat_roots)
The generic function hist computes a histogram of the given data values.
You can define the number of breaks
breaks: one of:
• a vector giving the breakpoints between histogram cells,
• a function to compute the vector of breakpoints,
• a single number giving the number of cells for the
histogram,
• a character string naming an algorithm to compute the
number of cells (see ‘Details’),
• a function to compute the number of cells.
In the last three cases the number is a suggestion only; as
the breakpoints will be set to ‘pretty’ values, the number is
limited to ‘1e6’ (with a warning if it was larger). If
‘breaks’ is a function, the ‘x’ vector is supplied to it as
the only argument (and the number of breaks is only limited
by the amount of available memory).
We will define explictly to have a break for each value in the contingency table.
num_of_breaks=length(table(data$Lat_roots))
hist(data$Lat_roots,breaks = num_of_breaks)
We create a new variable called data.group and "cut" into the levels given in the vector
cut(**x**, **breaks**, labels = NULL,
include.lowest = FALSE, right = TRUE, dig.lab = 3,
ordered_result = FALSE, ...)
x: a numeric vector which is to be converted to a factor by
cutting.
breaks: either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number
of intervals into which ‘x’ is to be cut.
labels: labels for the levels of the resulting category. By default,
labels are constructed using ‘"(a,b]"’ interval notation. If
‘labels = FALSE’, simple integer codes are returned instead
of a factor.
data.group <- cut(
data$Lat_roots,
c(0,5,10,100))
data.contingency_table <- table(data.group)
pie(data.contingency_table)
data$lateralization_factor <- factor(
cut( data$Lat_roots, c(0,5,10,20) ),
labels=c("Low","Medium","High")
);
data$lateralization
head(data)
data$length_mean <- rowMeans(data[,2:4])
head(data)
sd(x) calculates the standard deviation of the given vector x
Since we had a matrix of data and we want to apply sd() to each row we mas use apply()
Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
Usage:
apply(X, MARGIN, FUN, ...)
Arguments:
X: an array, including a matrix.
MARGIN: a vector giving the subscripts which the function will be
applied over. E.g., for a matrix ‘1’ indicates rows, ‘2’
indicates columns, ‘c(1, 2)’ indicates rows and columns.
Where ‘X’ has named dimnames, it can be a character vector
selecting dimension names.
FUN: the function to be applied: see ‘Details’. In the case of
functions like ‘+’, ‘%*%’, etc., the function name must be
backquoted or quoted.
...: optional arguments to ‘FUN’.
x: the matrix data[,3:5]
Margin: Since we want to apply to each row '1'
FUN: The function we want to apply is 'sd'
data$sd<-apply(data[,2:4],1, sd)
head(data)
write.csv prints its required argument ‘x’ (after converting it to a data frame if it is not one nor a matrix) to a file or connection.
write.csv() is a shortcut to write.table() with dec,sep hardcoded
write.table(x, file = "", append = FALSE, quote = TRUE, sep = ",", eol = "\n", na = "NA", dec = ".", row.names = TRUE, col.names = TRUE, fileEncoding = "")
x: the object to be written, preferably a matrix or data frame.
If not, it is attempted to coerce ‘x’ to a data frame.
file: either a character string naming a file or a connection open
for writing. ‘""’ indicates output to the console.
append: logical. Only relevant if ‘file’ is a character string. If
‘TRUE’, the output is appended to the file. If ‘FALSE’, any
existing file of the name is destroyed.
quote: a logical value (‘TRUE’ or ‘FALSE’) or a numeric vector. If
‘TRUE’, any character or factor columns will be surrounded by
double quotes. If a numeric vector, its elements are taken
as the indices of columns to quote. In both cases, row and
column names are quoted if they are written. If ‘FALSE’,
nothing is quoted.
sep: the field separator string. Values within each row of ‘x’
are separated by this string.
eol: the character(s) to print at the end of each line (row). For
example, ‘eol = "\r\n"’ will produce Windows' line endings on
a Unix-alike OS, and ‘eol = "\r"’ will produce files as
expected by Excel:mac 2004.
na: the string to use for missing values in the data.
dec: the string to use for decimal points in numeric or complex
columns: must be a single character.
row.names: either a logical value indicating whether the row names of
‘x’ are to be written along with ‘x’, or a character vector
of row names to be written.
col.names: either a logical value indicating whether the column names
of ‘x’ are to be written along with ‘x’, or a character
vector of column names to be written. See the section on
‘CSV files’ for the meaning of ‘col.names = NA’.
fileEncoding: character string: if non-empty declares the encoding to
be used on a file (not a connection) so the character data
can be re-encoded as they are written. See ‘file’.
#write.csv(data,file="data/new_root_length.csv")